Maximum entropy methods for biological sequence modeling

نویسندگان

  • Eugen C. Buehler
  • Lyle H. Ungar
چکیده

Many of the same modeling methods used in natural languages, speci cally Markov models and HMM's, have also been applied to biological sequence analysis. In recent years, natural language models have been improved upon by using maximum entropy methods which allow information based upon the entire history of a sequence to be considered. This is in contrast to the Markov models, whose predictions generally are based on some xed number of previous emissions, that have been the standard for most biological sequence models. To test the utility of Maximum Entropy modeling for biological sequence analysis, we used these methods to model amino acid sequences. Our results show that there is signi cant long-distance information in amino acid sequences and suggests that maximum entropy techniques may be bene cial for a range of biological sequence analysis problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Models for the Analysis of Heterogeneous Biological Data Sets

STATISTICAL MODELS FOR THE ANALYSIS OF HETEROGENEOUS BIOLOGICAL DATA SETS Eugen Buehler Lyle Ungar The focus of this thesis is on developing methods of integrating heterogeneous biological feature sets into structured statistical models, so as to improve model predictions and further understanding of the complex systems that they emulate. Combining data from different sources is an important ta...

متن کامل

Modeling and Performance of Waste Tires as Media in Fixed Bed Sequence Batch Reactor

Introduction: The modeling aims to simulate or optimize a process in physical, chemical or biological environments and the derived model will provide a considerable assistance to generate data and predict unknown condition, in case of sufficient suitability. Unsuitable disposal and elimination of waste tires have polluted the environment and human life areas, it also have caused removal of a hu...

متن کامل

A Note on the Bivariate Maximum Entropy Modeling

Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1  and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...

متن کامل

Modeling of the Maximum Entropy Problem as an Optimal Control Problem and its Application to Pdf Estimation of Electricity Price

In this paper, the continuous optimal control theory is used to model and solve the maximum entropy problem for a continuous random variable. The maximum entropy principle provides a method to obtain least-biased probability density function (Pdf) estimation. In this paper, to find a closed form solution for the maximum entropy problem with any number of moment constraints, the entropy is consi...

متن کامل

REBMEC: Repeat Based Maximum Entropy Classifier for Biological Sequences

An important problem in biological data analysis is to predict the family of a newly discovered sequence like a protein or DNA sequence, using the collection of available sequences. In this paper we tackle this problem and present REBMEC, a Repeat Based Maximum Entropy Classifier of biological sequences. Maximum entropy models are known to be theoretically robust and yield high accuracy, but ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001